Comparing Real-Real, Simulated-Simulated, and Simulated-Real Spoken Dialogue Corpora
نویسندگان
چکیده
User simulation is used to generate large corpora for using reinforcement learning to automatically learn the best policy for spoken dialogue systems. Although this approach is becoming increasingly popular, the differences between simulated and real corpora are not well studied. We build two simulation models to interact with an intelligent tutoring system. Both models are trained on two different real corpora separately. We use several evaluation measures proposed in previous research to compare between our two simulated corpora, between the original two real corpora, and between the simulated and real corpora. We next examine the differentiating power of these measures. Our results show that although these simple statistical measures can distinguish real corpora from simulated ones, these measures cannot help us to draw a conclusion on the “reality” of the simulated corpora since even two real corpora can be very different when evaluated on the same measures.
منابع مشابه
Simulated Spoken Dialogue System Based on IOHMM with User History
Expanding corpora is very important in designing a spoken dialogue system (SDS). In this big data era, data is expensive to collect and there are rare annotations. Some researchers make much work to expand corpora, most of which is based on rule. This paper presents a probabilistic method to simulate dialogues between human and machine so as to expand a small corpus with more varied simulated d...
متن کاملEvaluating spoken dialogue models under the interactive pattern recognition framework
The new Interactive Pattern Recognition (IPR) framework has been proposed to deal with human-machine interaction. In this context a new formulation has been recently defined to represent a Spoken Dialogue System as an IPR problem. In this work this formulation is applied to define graphical models that deal with Spoken Dialogue Systems. The definition of both a Dialogue Manager and a User Model...
متن کاملOn-Line Learning of a Persian Spoken Dialogue System Using Real Training Data
The first spoken dialogue system developed for the Persian language is introduced. This is a ticket reservation system with Persian ASR and NLU modules. The focus of the paper is on learning the dialogue management module. In this work, real on-line training data are used during the learning process. For on-line learning, the effect of the variations of discount factor (g) on the learning speed...
متن کاملOn-Line Learning of a Persian Spoken Dialogue System Using Real Training Data
The first spoken dialogue system developed for the Persian language is introduced. This is a ticket reservation system with Persian ASR and NLU modules. The focus of the paper is on learning the dialogue management module. In this work, real on-line training data are used during the learning process. For on-line learning, the effect of the variations of discount factor (g) on the learning speed...
متن کاملReinforcement Learning-based Spoken Dialog Strategy Design for In-Vehicle Speaking Assistant
In this paper, the simulated annealing Q-learning (SA-Q) algorithm is adopted to automatically learn the optimal dialogue strategy of a spoken dialogue system. Several simulations and experiments considering different user behaviors and speech recognizer performance are conducted to verify the effectiveness of the SA-Q learning approach. Moreover, the automatically learned strategy is applied t...
متن کامل